Overview

In this section, give a brief a description of your project and its goal, what data you are using to complete it, and what three faculty/staff in different fields you have spoken to about your project with a brief summary of what you learned from each person. Include a link to your final project GitHub repository.

Final Project on Github: https://github.com/Dokotela/BMIN503_Final_Project

Introduction

+In the first paragraph, describe the problem addressed, its significance, and some background to motivate the problem. It is well-known that obesity is a problem in our society. This is not just true for adults, but also for children. It is estimated that 18.5% of chldren 2-19 years old were overweight as of 20161. The rate of overweight is over 30%2. Hispanics and Non-Hispanic blacks had a higher rate of obesity than Non-Hispanic whites, and those in a lower socio-economic class. The outcomes of this are many, varied and can be severe, including a myriad of health problems along with a wide-range of socio-emotional consequences. Pediatricians are typically not familiar with the best ways to counsel and educate families about nutrition and exercise, and typically have enough time to do so. One possible solution is telehealth interventions. There have already been a few studies demonstrating improvement in adolescent physical activity or increase in fruit and vegetable consumption through telehealth interventions3,4.

+In the second paragraph, explain why your problem is interdisciplinary, what fields can contribute to its understanding, and incorporate background related to what you learned from meeting with faculty/staff. In order for these interventions to be effective, multiple different groups will need to be involved. Patients typically see their primary pediatrician as responsible for their health information, so it would be reasonable to base such an intervention out of an outpatient clinic. However, as above, pediatricians don’t generally have the expertise or time to do such counseling. One could involve psychologists, pediatricians, endocrinologists, eating disorder specialists, socail workers, other behavioral health specialists, coaches, and to do such a telehealth intervention would require some technicians to assist with setup, tech support and similar issues. We worked with nutritionists, pediatricians, and psychologists to develop our intervention, which questionnaires would be appropriate, and how to properly use motivational interviewing techniques. One of the concerns, and I’m not sure that we did this as well as we should have with the interventions, was to focus on health (eating nutritious foods and getting regular exercise) instead of focusing on weight in children.

Methods

+In the first paragraph, describe the data used and general methodological approach.

Participants

Patients were recruited from a local Federally Qualified Health Center. Inclusion criteria were age five through 17, inclusive, with a diagnosis of asthma (requiring medication) or overweight (greater than or equal to 85% for age). All study subjects had to be fluent in English or Spanish. Parents/Guardians were also required to be fluent in English or Spanish. This was determined by their own assessment.

Procedures

Prior to the study, it underwent and gained IRB approval and also an MOU and letter of support from the clinical medical director.

The clinic staff identified patients meeting the inclusion criteria. They then asked the patient’s permission to be contacted about the study. There were also fliers in the waiting room allowing the patient to contact the study team directly, and signed a release of information to allow review of medical history pertinent to the study. If the patient gave consent to contact, they were called by the study team, given an explanation of the study, and if interested were scheduled for their initial appointment.

All patients who meet the study inclusion criteria were offered an integrated approach to chronic condition management with a specialist team (a chronic condition specialist and a behavior specialist). Initially the goal was to recruit and have two arms, one with a technology component and the other without. As recruitment was more difficult than anticipated, it was changed to a single arm study.

At the initial visit, and after the study is explained and all questions answered, a written consent was obtained in English or Spanish from parents and those patients 18 years old, and assent was obtained from those patients 7-17. These were all captured on the tablet. All forms were available in English and Spanish. The patients’ history, including medical, family, genetic considerations and social history was evaluated, including a chart review. Data from the medical charts that will be extracted include: ER and hospital visits, type, frequency and dose of medications used, blood pressure, heart rate, weight, height, diet history and exercise patterns. At the initial visit for overweight children, the patients was assessed for their behaviors and attitudes regarding eating, physical activity, sedentary time and motivation. For those patients with asthma, the pediatrician also assessed understanding of the disease. At the first and last visit there were pre- and post-intervention assessment points, adherence to medical regimen, physical activity and dietary habits that were gathered. During the 4 intervening visits in addition to the first and last, quality of life of assessed. At the final visit data was also gathered aboutt he patient and parent perceptions of the program. All surveys were completed in REDCap (internet-based, secure, encrypted, HIPAA-compliant software), whether in person or online, to allow easier data analysis.

During the course of the study, the chronic condition pediatrician and behavior specialist acted in a consult capacity for their chronic condition, but the patient continued to see their regular pediatrician for primary care needs. At every visit any changes since the last visit were reviewed, including a chart review. For overweight children the specialists provided education about what is considered healthy eating, physical activity and motivation. For children with asthma, the specialists provided education about the disease itself, including symptoms, precipitating, risk and preventive factors, its natural history and proper management. The behavior health specialists then helped the patient to develop a plan on how to follow their treatment plan and achieve healthy behaviors. Problem-solving, goal-setting, and motivational interviewing techniques were used.

Each patient enrolled in the study was given access to technology in the form of a tablet computer. The tablets were installed with software to help educate patients and their families about asthma and overweight/obesity and track progress in managing them. In addition to educational apps, each tablet was preloaded with Fuze, a secure, HIPAA compliant teleconferencing app. The first and last visits were done in person at the clinic, but the four follow-up visits were conducted as a “virtual” visit via online teleconferencing. On the date of each “virtual” visit, the study participant was called 3 times to try and meet. For this group regular service and usage fees during the course of the study was covered as part of the study and was not be billed. Each participant was required to complete and sign an Equipment Check-Out Form and an End-User agreement form from Verizon. All participants were called 1 week before each appointment, then again 72 hours before the appointment.

At each visit attended, every participant received a $10.00 gift certificate.

+Subsequently, incorporate full R code necessary to retrieve and clean data, and perform analysis. Be sure to include a description of code so that others (including your future self) can understand what you are doing and why.

Measures

Columns about technology can be combined together
#this takes as arguments name of first and last column to combine, then loops through, seeing which columns are empty, and collecting them all in first column, then deleting it.
cleaner <- function(string1, string2, info){
  cols <- which(colnames(info)==string1):which(colnames(info)==string2) #get col numbers for first and last string
  info[cols] <- sapply(info[cols], as.character)
  for(j in 1:nrow(info)){
    for(i in cols){
      info[cols[1]][[1]][j] <- paste(if_else(is.na(info[cols[1]][[1]][j]), "", info[cols[1]][[1]][j]),
                                     if_else(is.na(info[i][[1]][j]), "", info[i][[1]][j]), sep="") #combine the current first column with any others that aren't null
    }
  }
  info[cols[1]][[1]] <- na_if(info[cols[1]][[1]], "") #replace all "" with NA
  info[cols[-1]] <- NULL
  info
}

#add columns as needed
survey_columns <- function(info, list) {
  for(i in 1:length(list)){
    info[[list[i]]] <- NA
  }
  info
}

#combining columns together
data$tech_kid_brand___1 = factor(data$tech_kid_brand___1,levels=c("0","1"), labels = c(NA,"iPhone or iPad"))
data$tech_kid_brand___2 = factor(data$tech_kid_brand___2,levels=c("0","1"), labels = c(NA,"Samsung"))
data$tech_kid_brand___3 = factor(data$tech_kid_brand___3,levels=c("0","1"), labels = c(NA,"LG"))
data$tech_kid_brand___4 = factor(data$tech_kid_brand___4,levels=c("0","1"), labels = c(NA,"HTC"))
data$tech_kid_brand___5 = factor(data$tech_kid_brand___5,levels=c("0","1"), labels = c(NA,"Motorola"))
data$tech_kid_brand___6 = factor(data$tech_kid_brand___6,levels=c("0","1"), labels = c(NA,"Other"))
data$tech_kid_brand___7 = factor(data$tech_kid_brand___7,levels=c("0","1"), labels = c(NA,"Don't Know"))
data$tech_kid_os___1 = factor(data$tech_kid_os___1,levels=c("0","1"), labels = c(NA,"iPhone or iPad"))
data$tech_kid_os___2 = factor(data$tech_kid_os___2,levels=c("0","1"), labels = c(NA,"Android/Google"))
data$tech_kid_os___3 = factor(data$tech_kid_os___3,levels=c("0","1"), labels = c(NA,"Windows"))
data$tech_kid_os___4 = factor(data$tech_kid_os___4,levels=c("0","1"), labels = c(NA,"Blackberry"))
data$tech_kid_os___5 = factor(data$tech_kid_os___5,levels=c("0","1"), labels = c(NA,"Other"))
data$tech_kid_os___6 = factor(data$tech_kid_os___6,levels=c("0","1"), labels = c(NA,"Unsure"))
data <- cleaner("tech_kid_brand___1", "tech_kid_brand___7", data)
data <- cleaner("tech_kid_os___1", "tech_kid_os___6", data)
Create some summary statistics for PedsQL survey, group by subsections
#get some row means for the survey, pass the dataframe, first and last column (inclusive) to be included
rowmeaning <- function(info, col1, col2){
  store <- info[grep(paste("^", col1, "$", sep = ""), colnames(info)):grep(paste("^", col2, "$", sep = ""), colnames(info))]
  info <- rowMeans(store)
  info[rowSums(is.na(store)) >= 0.5] <- NA
  info[which(rowSums(is.na(store)) / ncol(store) >= 0.5)] <- NA
  info
}

#create subgroups for pedsql
data <- survey_columns(info = data, list = c("qlPtTotal", "qlPtPhys",  "qlPtEmotion", "qlPtSocial", "qlPtSchool", "qlParTotal", "qlParPhys", "qlParEmotion", "qlParSocial", "qlParSchool"))
label(data$asthma15)="If a child dies from an asthma attack, this usually means that there was no time to start any treatment."

#calculate summary calculation for pedsql filled out by the patients
data$qlPtTotal <- rowmeaning(data, "pedsqlkids_01", "pedsqlkids_23")
data$qlPtPhys <- rowmeaning(data,"pedsqlkids_01","pedsqlkids_08")
data$qlPtEmotion <- rowmeaning(data, "pedsqlkids_09", "pedsqlkids_13")
data$qlPtSocial <- rowmeaning(data, "pedsqlkids_14", "pedsqlkids_18")
data$qlPtSchool <- rowmeaning(data, "pedsqlkids_19", "pedsqlkids_23")

#calculate summary calculation for pedsql filled out by the parents
data$qlParTotal <- rowmeaning(data, "pedsqlparent_01", "pedsqlparent_23")
data$qlParPhys <- rowmeaning(data,"pedsqlparent_01","pedsqlparent_08")
data$qlParEmotion <- rowmeaning(data, "pedsqlparent_09", "pedsqlparent_13")
data$qlParSocial <- rowmeaning(data, "pedsqlparent_14", "pedsqlparent_18")
data$qlParSchool <- rowmeaning(data, "pedsqlparent_19", "pedsqlparent_23")
Label data, most of this was done by REDCap
Create the demographics table
#making some tables of basic descriptive data
list1 <- c("redcap_id", "Obese?", "Asthma?", "dem_gender", "dem_grade", "Ethnicity", "dem_language_pref", "tech_kid_device_time", "tech_kid_brand___1", "tech_kid_os___1", "dem_language_home", "dem_parent_habitants", "dem_parent_sex", "dem_parent_ethnicity", "dem_parent_language_pref", "dem_parent_marital", "dem_parent_us", "dem_parent_income", "dem_parent_job", "dem_parent_school")

#select those columns
demotable <- completeFirst[which(colnames(completeFirst) %in% list1)][, list1]

list2 <- c("id", "Is child obese", "Does child have asthma", "Gender of child", "Grade of child", "Ethnicity of child", "Child's preferred language", "Time child spends on device", "Technology brand used", "Technology OS used", "Language spoken at home", "Number of people living at home", "Gender of parent", "Ethnicity of parent", "Parent's preferred language", "Parent's Marital Status", "How long has parent been in US", "Parent's Income", "Parent's Employment", "Parent's highest school completed")

#rename the columns something nicer for Table 1
colnames(demotable) <- list2

tablelist <- c('<table><tr>')

tablelist <- c(tablelist, '<td valign="top" width="25%"><table>')
for(i in 2:8){ tablelist <- c(tablelist, '<tr><td>', kable(demotable %>% group_by_at(list2[i]) %>% summarise(" " = n()) %>% as.data.frame(.), format='html', escape = TRUE, position="centered") %>% kable_styling(full_width = TRUE), '</td></tr>') }

tablelist <- c(tablelist, '</table></td><td valign="top" width="25%"><table>')
for(i in 9:12){ tablelist <- c(tablelist, '<tr><td>', kable(demotable %>% group_by_at(list2[i]) %>% summarise(" " = n()) %>% as.data.frame(.), format='html', escape = TRUE, position="centered") %>% kable_styling(full_width = TRUE), '</td></tr>') }

tablelist <- c(tablelist, '</table></td><td valign="top" width="25%"><table>')
for(i in 13:17){ tablelist <- c(tablelist, '<tr><td>', kable(demotable %>% group_by_at(list2[i]) %>% summarise(" " = n()) %>% as.data.frame(.), format='html', escape = TRUE, position="centered") %>% kable_styling(full_width = TRUE), '</td></tr>') }

tablelist <- c(tablelist, '</table></td><td valign="top" width="25%"><table>')
for(i in 18:20){ tablelist <- c(tablelist, '<tr><td>', kable(demotable %>% group_by_at(list2[i]) %>% summarise(" " = n()) %>% as.data.frame(.), format='html', escape = TRUE, position="centered") %>% kable_styling(full_width = TRUE), '</td></tr>') }

tablelist <- c(tablelist, '</table></td></tr></table>')
Plot PedsQL data, Nutrition/Exercise Survey, Weight Surveys, mark the significant plots
#applies ttest to column that ends in .x and .y (usually after merging two datasets)
ttest <- function(info, col){
  t.test(info[[col]], info[[gsub(".x", ".y", col)]], paired=TRUE, alternative = "two.sided")
}

#trying to make easier fxn to apply ttest, applies ttest to all columns between col1 and col2 inclusive
applying <- function(col1, col2, info){
  info <- as.data.frame(lapply(select(info, col1:col2, gsub(".x", ".y", col1):gsub(".x", ".y", col2)), as.numeric))
  info <- t(sapply(colnames(select(info, col1:col2)), ttest, info=info)) %>%
    as.data.frame() %>%
    add_column(., "Name" = gsub(".x", "", row.names(.)), .before = "statistic")
}


#boxplot for significant changes from first and last visits
boxy <- function(info, name, sig, y, offset){
  info[[name]] <- as.numeric(info[[name]])
  text_color <- "black"
  text_background <- if_else(sig[name,]$p.value <= 0.05, "red", "gray")
  ggplot(info, aes_string(x = "redcap_event_name", y=name)) + 
    geom_boxplot() +
    theme(text = element_text(size = 25), axis.text = element_text(size = 25)) +
    labs(x = "", y="", title = str_wrap(label(complete[grep(name, colnames(complete))]), width=45)) +
    annotate("text", x=1.5, y=y, label = paste("T-Test: t=", toString(round(as.numeric(sig[name,]$statistic), digits=5)), "\nP-value: ", toString(round(as.numeric(sig[name,]$p.value), digits=5))), color = text_color, size=7) + 
    annotate("rect", xmin=1.2, xmax=1.8, ymin=y-offset, ymax=y+offset, alpha=0.2, fill=text_background)
}

#get list of significant results for pedsql from patient, then plot
sig <- applying("qlPtTotal.x", "qlPtSchool.x", combined)
pqlpatientplot <- lapply(colnames(select(complete, "qlPtTotal":"qlPtSchool")), boxy, info = firstLast, sig = sig, y = 35, offset=7)

#plot list of significant results for pedsql by parent, then plot
sig <- applying("qlParTotal.x", "qlParSchool.x", combined)
pqlparentplot <- lapply(colnames(select(complete, "qlParTotal":"qlParSchool")), boxy, info = firstLast, sig = sig, y = 35, offset=7)

#get significant results for first and last visit for activity and nutrition surveys
sig <- applying("act_nutri_kids01.x", "act_nutri_kids27.x", combined)
actNutriKids <- lapply(sig[(sig$p.value <= 0.1),]$Name, boxy, info = firstLast, sig = sig, y = 8, offset=1)

#get significant results for first and last visit for activity and nutrition surveys from parents
sig <- applying("act_nutri_parent01.x", "act_nutri_parent31.x", combined)
actNutriParents <- lapply(sig[(sig$p.value <= 0.1),]$Name, boxy, info = firstLast, sig = sig, y = 8, offset=1)

#get significant results for first and last visit for weight surveys from patient
sig <- applying("weight_01.x", "weight_16.x", combined)
weightPts <- lapply(sig[(sig$p.value <= 0.1),]$Name, boxy, info = firstLast, sig = sig, y = 8, offset=1)
Linear Regression lines for prior to study and after study for weight
weights=read.csv('weights.csv')
weights$date <- as.Date(as.character(weights$date))
colnames(weights)[ncol(weights)] <- "days"
weights$days <- weights$date - as.Date("2014-01-01")
prestudy <- subset(weights, date < "2015-09-01")
poststudy <- subset(weights, date >= "2015-09-01")
prestudy <- droplevels(prestudy)
poststudy <- droplevels(poststudy)

#this just groups the data into sets pre and post start of the study, then gets coefficients for linear regression
lining <- function(info) {
  measures <- as.data.frame(levels(info$redcap_id))
  colnames(measures) <- "redcap_id"
  for(ins in levels(info$redcap_id)) {
    measures$slope[which(measures$redcap_id==ins)] <- lm(weight ~ days, data=subset(info, redcap_id==ins))$coefficients[2]
    measures$intercept[which(measures$redcap_id==ins)] <- lm(weight ~ days, data=subset(info, redcap_id==ins))$coefficients[1]
  }
  return(measures)
}

#plots all of the weights, pre and post beginning of study, along with linear regression lines for that part
plotting <- function(ins, pre, post) {
    if(length(which(pre$redcap_id==ins)) > 1) {
      precoeff <- lm(weight ~ days, data=subset(pre, redcap_id==ins))$coefficients
      precoeffs <- c(0, precoeff[1], 608, precoeff[1]+608*precoeff[2])
    } else {
      precoeffs <- c(0,0,0,0)
    }
    if(length(which(post$redcap_id==ins)) >  1) {
      postcoeff <- lm(weight ~ days, data=subset(post, redcap_id==ins))$coefficients
      postcoeffs <- c(608, postcoeff[1]+postcoeff[2]*608, 1000, postcoeff[1]+1000*postcoeff[2])
    } else {
      postcoeffs <- c(0,0,0,0)
    }
    ggplot() +
      geom_segment(aes(x = precoeffs[1], y =precoeffs[2], xend=precoeffs[3], yend=precoeffs[4], color = "blue4"), data=pre) +
      geom_segment(aes(x = postcoeffs[1], y =postcoeffs[2], xend=postcoeffs[3], yend=postcoeffs[4], color = "firebrick1"), data=post) +
      theme(legend.position="none") +
      theme(axis.title.x=element_blank()) +
      theme(axis.title.y=element_blank())
}

#collect the groups, get linear regression lines, store coefficients
measures <- merge(lining(prestudy), lining(poststudy), by = "redcap_id", all.y=TRUE)
measures$group <- if_else(grepl("Control", measures$redcap_id), "control", "study")
studyprepost <- subset(measures, group=="study")
controlprepost <- subset(measures, group=="control")

#create graphs for patient weights, fit segments of trend lines to weights before and after study started
wtplots <- lapply(levels(weights$redcap_id), plotting, pre=prestudy, post=poststudy)
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
## The following object is masked from 'package:dplyr':
## 
##     combine
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

## 
## [[11]]

## 
## [[12]]

## 
## [[13]]

## 
## [[14]]

## 
## [[15]]

## 
## [[16]]

## 
## [[17]]

## 
## [[18]]

## 
## [[19]]

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

## 
## [[11]]

## 
## [[12]]

## 
## [[13]]

## 
## [[14]]

## 
## [[15]]

## 
## [[16]]

## 
## [[17]]

## 
## [[18]]

## 
## [[19]]

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

## 
## [[11]]

## 
## [[12]]

## 
## [[13]]

## 
## [[14]]

## 
## [[15]]

## 
## [[16]]

## 
## [[17]]

## 
## [[18]]

## 
## [[19]]

Results

Describe your results and include relevant tables, plots, and code/comments used to obtain them. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you’d like, but this is not required.

Is child obese
Yes 15
No 3
Does child have asthma
Yes 6
No 12
Gender of child
Female 8
Male 10
Grade of child
<= 5th Grade 9
6-8th Grade 7
9-12th Grade 2
Ethnicity of child
Black, Non-Hispanic 6
Hispanic 12
Child’s preferred language
English 14
Creole 2
Spanish 2
Time child spends on device
1 hour 1
1 - 2 hours 3
2 - 3 hours 1
3 - 4 hours 3
5 - 6 hours 1
NA 9
Technology brand used
Don’t Know 1
iPhone or iPadiPhone or iPad 3
Other 4
Samsung 2
NA 8
Technology OS used
Android/Google 3
iPhone or iPadiPhone or iPad 3
Other 1
Unsure 1
Windows 1
NA 9
Language spoken at home
English 5
Creole 2
Spanish 11
Number of people living at home
2 people 1
3 people 2
4 people 4
5 people 4
6 people 2
7 people 2
NA 3
Gender of parent
Female 13
Male 1
NA 4
Ethnicity of parent
Black, Non-Hispanic 3
Hispanic 11
Other 1
NA 3
Parent’s preferred language
English 3
Creole 2
Spanish 10
NA 3
Parent’s Marital Status
Single 5
Married 9
Divorced 1
NA 3
How long has parent been in US
Less than 1 year 1
1-3 years 3
4-6 years 1
7-9 years 3
10-12 years 1
More than 15 years 6
NA 3
Parent’s Income
Below $10,000 3
$10,000 - $24,999 9
$25,000 - $39,999 2
$40,000 - $54,999 1
NA 3
Parent’s Employment
Full Time 3
Part Time 7
Unemployed 5
NA 3
Parent’s highest school completed
9th to 11th grade (some high school) 5
12th grade or GED (high school graduate) 4
1 to 3 years college (some college or technical school) 5
Graduate training or professional degree 1
NA 3
## [1] 0.149661
## [1] 0.2752352
## [1] 0.5451327
## [1] 0.1349553
## 
##  Welch Two Sample t-test
## 
## data:  studyprepost$diff and controlprepost$diff
## t = 0.23032, df = 8.6634, p-value = 0.8232
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.06584312  0.08067238
## sample estimates:
##   mean of x   mean of y 
## -0.01730112 -0.02471575